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1. INTRODUCTION 

Professor Brillinger is to be congratulated on this 
paper which is both a contribution to the history of 
statistics and an introduction to statistical modeling 
using stochastic processes, a topic that continues to 
be of great relevance in theory and applications. It is 
interesting to see how many ideas have been formu- 
lated already at an early stage. In particular, I like 
the idea of "synthetic data" to judge the adequacy 
of a fitted model. With time series or spatial data, 
one typically needs only a few replicates to assess 
visually the differences between real and synthetic 
data, and so this is really a powerful tool. 

2. COMMENTS ON THE DATA EXAMPLES 

If I understand the description of the data behind 
Figure 4 correctly, the rainfall has been averaged 
over 53 seeding days. I would expect the wind speeds 
to vary from day to day, so I would use a hierarchical 
model for the wind speeds vj with a variance com- 
ponent within the same day and a variance com- 
ponent between days. The variation between days 
would lead to some variation of the time of the peak, 
and averaging would smear it out. Hence the sharp 
peak in Figure 4 is even more surprising. The only 
possibility I see for a model that produces a similar 
peak as in the actual data, is to assume a decaying 
intensity for the process of rain particles in Ticino. 

In the two population dynamics examples, no full 
probabilistic model is constructed. Only the condi- 
tional mean values and not the distribution of the 
fluctuations are considered. Least squares methods 
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are used for fitting. Moreover, the reproduction pro- 
cess is not part of the model, although the reader is 
referred to Guttorp (1980) for a treatment of births 
in the second example. From a pragmatic point, 
it can certainly be advantageous to focus on those 
parts that are of primary concern without making 
assumptions on other processes in the system. If 
only the population above a certain threshold age 
are of interest and if information about the number 
of individuals reaching the threshold age is available, 
then one does not need to model the births. On the 
other hand, as I will argue below, there is also the 
point of view that in order to understand a system, 
all relevant processes should be included. 

As an interesting complement to Example 7, I 
would like to mention the paper Jonsen, Mills Flem- 
ming and Myers (2005) which also analyses seal move- 
ment data. They use a discrete-time integrated ran- 
dom walk for the animal movements, with interpo- 
lation to accommodate irregular observation times, 
and i-distributions for the observation errors. With 
such a model, they can use state-space methodol- 
ogy to fit the model to the data, without having 
to exclude suspicious observations. Including a drift 
component to the integrated random walk is possi- 
ble, but would make the analysis more complicated. 

3. DETERMINISM AND INDETERMINISM 

Even 50 years after Neyman's work discussed in 
this paper, many fields of science are still dominated 
by deterministic models, at least in the area of envi- 
ronmental modeling where I have most experience. 
The reasons for this dominance are that scientists 
are interested in models that 

• take as much knowledge about the underlying pro- 
cesses into account as possible, 

• contribute to the understanding of these processes, 

• are transferable to similar systems, 

• allow prediction of the same system under differ- 
ent driving conditions than those observed, 

• have parameters with a clear subject matter in- 
terpretation. 
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Some of these reservations can be made against the 
analysis of the weather modification experiment de- 
scribed in the paper. No attempt is made to connect 
the data with physical knowledge about atmospheric 
processes in the alpine region on experimental days, 
and even if the model gave a satisfactory fit it would 
not be clear how the estimated velocity distribution 
could be transfered to a slightly different location. 

It is more difficult than one might think to in- 
clude all relevant knowledge about the underlying 
processes into simple statistical models. One reason 
is that this knowledge can be described mathemat- 
ically only in continuous time and space, using or- 
dinary or partial differential equations, whereas sta- 
tistical models tend to be in discrete time and space. 
Another reason is that physical models typically in- 
volve many variables that are not observed, and this 
complicates the statistical analysis. 

It is clear, however, that even the best available 
deterministic models are limited because initial con- 
ditions, boundary conditions or inputs are uncertain 
and because the computational complexity of such 
models requires essential simplifications to make them 
tractable. As a result, when fitting ordinary and par- 
tial differential equation models to data by nonlinear 
least squares, one often finds that there are system- 
atic deviations between model outputs and obser- 
vations that cannot be explained by measurement 
errors alone. Hence, in my view, the challenge for 
statisticians is to develop methods which build upon 
the deterministic models in science and at the same 
time allow to describe uncertainties in realistic ways 
or even to enhance understanding or improve model 
extrapolation. 

I concur very much with Professor Brillinger that 
statisticians should make more use of SDEs in their 
modeling. They are convenient not only because they 
can handle observations at irregular time points, but 
rather because they allow to introduce uncertainty 
into ODEs. However, one should be aware that the 
noise term has a profound impact on the behavior of 
the solutions: They are no longer smooth, but have 
infinite variation on any small interval. This is def- 
initely not realistic for the tracks of elks and seals. 
In fact most natural systems are believed to evolve 
smoothly, at least on the macroscopic scale. This 
makes it difficult to decide which statistical meth- 
ods one should use when fitting SDE models to data 
since we do not really believe in the fine structure 
of the model. 



Adding a white noise disturbance to an ODE is 
not the only way to introduce uncertainty into deter- 
ministic modeling approaches. Kennedy and O'Hagan 
(2001) and Bayarri et al. (2007) introduce — in addi- 
tion to the measurement error term — a model inad- 
equacy, or bias, term that is intended to capture 
the effect of model deficiencies on model output. 
It is usually formulated in a nonparametric form 
Gaussian process with a suitable covariance 
function. While this approach is universally appli- 
cable and can lead to more reliable uncertainty es- 
timates, its use for diagnosing possible causes for 
model deficits is limited. An alternative consists in 
replacing a constant model parameter or a deter- 
ministic input by a stochastic process model, in the 
spirit of the quote from Neyman by David Brillinger 
in Section 3.3 ". . . with coefficients that are not con- 
stants, but random variables." Such a time-varying 
parameter or input can be estimated jointly with the 
other (time-constant) parameters, and from a care- 
ful analysis of the estimated trajectory additional in- 
sight into the nature of model deficits can be gained. 
This approach has been used in the systems analysis 
literature for more than 20 years; see, for example, 
Beck (1983) or Kristensen, Madsen and J0rgensen 
(2004). These authors use a discrete-time setting 
and extended Kalman filter techniques. Tomassini et al. 
(2007) develop a MCMC algorithm that can be used 
in continuous time and without any linearization 
technique. 

To illustrate the differences between adding a noise 
term to a differential equation and making a param- 
eter time- varying, consider the simple growth model 

dxt = (3xt dt. 

The solution of the corresponding SDE 

dX t = (3X t dt + a X t dB t 

is X t = Xq exp((/5 - a 2 /2)t + aB t ). Hence not only 
the local, but also the long-time behavior is changed 
by the added noise. This SDE can be interpreted as 
saying that we add to the growth rate (5 of the de- 
terministic model a white noise term. This is hardly 
plausible biologically, and I thus prefer the version 

dp t = -l(Pt-P)dt + adB t , 

dX t = (3 t X t dt. 

A mean-reverting Ornstein-Uhlenbeck process has 
been chosen for the dynamics of fit because of its 
simplicity. It has the drawback of allowing negative 
values, but other choices are possible. 
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4. TECHNICAL DIFFICULTIES WITH 
STOCHASTIC DIFFERENTIAL EQUATIONS 

I have discussed reasons to use SDE models in 
statistics. David Brillinger has shown some simple 
and pragmatic approaches for fitting and analyzing 
them, mainly by relying on the Euler (or Euler- 
Maruyama) approximation (3). If one wants to re- 
fine this approach and consider, for instance, exact 
maximum likelihood estimation, then a number of 
technical difficulties occur. I would like to give a 
brief overview of these difficulties and modern ap- 
proaches to solve them since this is a very active 
area of research at the moment. 

4.1 Exact Observations at Discrete Time Points 

Let us consider the SDE 

dX t = fi(X t ,0)dt + a(X t ,9)dB t 

with known initial condition xq and unknown pa- 
rameter 0. If we observe the solution (Xt) at dis- 
crete time points £j, then the log likelihood can be 
written as 

log pe (U+i ~ U , x u , x ti+1 ) • 

i 

The transition densities pg(t,x,y) are, however, not 
available in closed form, and the Euler approxima- 
tion implied by (3) is often not accurate enough (the 
resulting estimator is usually not consistent if the 
distances ij+i — ti remain fixed). Numerical compu- 
tation is difficult because one has to solve a partial 
differential equation (Fokker-Planck) . This can be 
done in one dimension (see Lindstrom (2007)), but 
in higher dimensions it is not practical. Estimating 
equations other than the MLE have been considered, 
but for them one also has to compute conditional 
expectations of the form 

E e [^{x s ,x u e)\x s ) 

for s < t which often cannot be done in closed form. 

The emphasis in much of the recent work has 
been on Monte Carlo methods. In the framework 
of estimating equations, this has been developed by 
Kessler and Paredes (2002). The nice feature of their 
approach is that if one estimates the conditional ex- 
pectation by J replicates, the asymptotic variance 
of the corresponding estimator increases by a fac- 
tor (1 + 1/J). This means that a small number of 
replicates is sufficient for all practical purposes. 

Monte Carlo methods can also be used for like- 
lihood inference. A natural approach is to consider 



the values of the process (Xt) on a fine grid between 
observation times as latent variables and to use the 
Euler approximation for these smaller time steps. 
The latent values can then be integrated out us- 
ing importance sampling, or one can apply the EM- 
algorithm. Since the E-step rarely can be done an- 
alytically, one has to use a Monte Carlo method in- 
stead. For integrating the latent variables out, 
Durham and Gallant (2002) have proposed clever 
importance distributions which are crucial for the 
method to become useful. As discussed by 
Roberts and Stramer (2001), the EM-algorithm suf- 
fers from poor convergence if the diffusion coefficient 
a depends on unknown parameters. The reason for 
this is that the precision for estimating the diffu- 
sion coefficient a of an SDE goes to infinity as the 
observation points become dense in an interval, or 
in other words, the distributions of the solution of 
two SDEs with different a are mutually singular. To 
overcome this problem, Roberts and Stramer (2001) 
propose a transformation of the latent variables that 
reduces the information they contain about a. The 
same problem affects also the first approach: One 
cannot use the same importance distribution for pa- 
rameter values that correspond to different values of 
a. Hence one cannot estimate the whole likelihood 
function by a single simulation experiment. 

An entirely different approach has been used by 
Beskos, Papaspiliopoulos, Roberts and Fearnhead 
(2006). Based on exact simulation methods for dif- 
fusions, they propose several unbiased estimators of 
the likelihood function and a stochastic EM-algorithm 
The ideas in this paper are most interesting, but un- 
fortunately they cannot be used for arbitrary SDEs. 
It must be possible to transform the variables so that 
the diffusion coefficient is constant, and the drift co- 
efficient b must be derived from a potential function. 

4.2 Partial and Noisy Observations 

In many cases, we do not observe X^ exactly, but 
only random variables Y\ which are conditionally in- 
dependent and such that Y{ depends on X ti only: 

Yi | X t . ~/(y | x u )dy. 

We are then in the setup of state-space models. Par- 
ticle filter methods (see, e.g., Doucet, de Freitas and 
Gordon (2001)) can be used to estimate both the 
likelihood function and the unobserved path (Xt). 
Again, the unavailability of the transition densities 
creates additional problems. One can either extend 
the state variables by adding the values of (Xt) on a 
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fine grid between observation times, or one can use 
again ideas from exact sampling of diffusions; see 
Fearnhead, Papaspiliopoulos and Roberts (2008). 
The issue of efficient computations for estimating 
parameters remains an open problem. 

Ramsay, Hooker, Campbell and Cao (2007) have 
considered an approach to estimate simultaneously 
an approximate solution of an ODE together with 
unknown parameters 6 by minimizing 



d 
dt 



dt. 



Numerically, this minimization problem is solved by 
approximating (xt) in a spline basis. The approach 
can be viewed as MAP estimation in an SDE model 
with drift [i and constant diffusion coefficient 1 / \/2A. 
In the case of a more general diffusion coefficient, 
one can replace the second term by 



a(xt,0)~ 



d 

-rXt 
dt 



fj,(x t ,e) 



dt. 



I think that for estimating parameters, this approach 
is potentially simpler than those based on the par- 
ticle filter. 
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